-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ibis_cloud_spanner folder added #186
Conversation
|
||
for item in schema_list: | ||
field_name = item.name | ||
if(item.type_.code == 8): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of 8, we can use TypeCode.ARRAY
https://googleapis.dev/python/spanner/latest/_modules/google/cloud/spanner_v1/types/type.html#TypeCode
db_id = self.dataset_id | ||
database = instance.database(db_id) | ||
with database.snapshot() as snapshot: | ||
query="select * from {} limit 1".format(self.table_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect this would still give you the schema with LIMIT 0
.
# return '{}.`{}`'.format(arg_formatted, field) | ||
|
||
|
||
def _array_concat(translator, expr): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These methods are copy-pasted from BigQuery, correct? I'd much rather see a much smaller compiler module that inherits from the BigQuery connector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we have taken these methods from BigQuery as they also work for cloud spanner. But there are two methods namely "_array_index" and "_regex_extract" who have slightly differnt logic in cloud spanner.
So I have updated this "compiler.py" file which now inherits from the BigQuery connector and contains the above two mentioned methods.
|
||
|
||
|
||
__all__ = ('compile', 'connect', 'verify', '' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the trailing empty string(s) ''
about? Seems like these should be removed.
from third_party.ibis.ibis_cloud_spanner import table | ||
|
||
from google.cloud import spanner | ||
from google.cloud import spanner |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see at least three from google.cloud import spanner
imports. These can be consolidated.
from functools import partial | ||
|
||
import numpy as np | ||
import regex as re |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused as to why we aren't using the built-in re
module. If we are using a third-party library, then we really shouldn't be renaming it to something that conflicts with the built-in.
import google.cloud.spanner as cs | ||
from google.cloud import spanner | ||
import pandas as pd | ||
import regex as re |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why aren't we using the re module built-in? https://docs.python.org/3/library/re.html
@@ -0,0 +1,354 @@ | |||
# Copyright 2021 Google Inc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you copy the interface from the Table
class in googleapis/python-spanner#219 instead? That way we can switch more easily to it once it has merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you be more descriptive on this? According to our understanding, are you asking us to include functions like "_get_schema", "_exists" ,etc. which are present in Table class in "googleapis/python-spanner#219" and absent in our Table class version?
""" | ||
return table.TableReference(self, table_id) | ||
|
||
class DatasetReference(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remind me why this class is necessary? Can't we just use Database
objects?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class required as we are using "TableReference" class which takes the object of "DatasetReference" class as an argument.
As representation of table object is like:- Table(TableReference(DatasetReference))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Tim and the enitre Google Cloud Spanner team for constant commitment and support throughout the affiliation.
# limitations under the License. | ||
|
||
|
||
"""CloudScanner public API.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""CloudScanner public API.""" | |
"""Cloud Spanner public API.""" |
self.database_name = self.instance.database(database_id) | ||
( | ||
self.data_instance, | ||
self.billing_instance, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spanner doesn't support cross-instance queries in the way that BigQuery does. There isn't a separation between "data" and "billing" instance.
We do not need the same "parse" logic that we have in BigQuery.
|
||
|
||
""" | ||
self.spanner_client = spanner.Client() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want a project
parameter passed in here. It can be an optional argument to connect
.
See: https://googleapis.dev/python/spanner/latest/client-api.html#google.cloud.spanner_v1.client.Client
|
||
def _fully_qualified_name(self, name, database): | ||
instance, dataset = self._parse_instance_and_dataset(database) | ||
return "{}".format(name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
???
Wouldn't this whole function be simplified to return name
?
This is consistent with the BigQuery Ibis connector
"ibis_cloud_spanner" folder has been added under the branch named "ibis-cloud-spanner". This folder doesn't include "tests" folder. Also the "compiler.py" file contains only the basic code, the remaining functionalities are yet to be added.